73 research outputs found

    Les mixtures de Dirichlet et leurs apports pour la classification et la recherche d'images par le contenu

    Get PDF
    Le développement de la médecine moderne dans le domaine des techniques de diagnostic comme la radiologie, l'histopathologie et la tomographie avait comme résultat l'explosion du nombre et de l'importance des images médicales sauvegardées par la majorité des hôpitaux. Afin d'aider les médecins à confirmer leurs diagnostics, plusieurs systèmes de recherche d'images médicales ont vu le jour. La conception de ces systèmes présente plusieurs étapes. Nous pensons que le résumé des bases de données d'images est une étape importante dans chaque système de recherche. En effet, la catégorisation d'une base de données d'images facilite énormément la recherche et permet de localiser les images voulues en un minimum de temps. Dans ce mémoire, nous étudions en un premier temps, les différents problèmes communs à tous les systèmes de recherche d'images à savoir l'indexation, l'extraction des caractéristiques, la définition des mesures de similarités et le retour de pertinence. Nous étudions aussi d'autres catégories de problèmes spécifiques à la recherche d'images. Cette étude est complétée par une analyse des systèmes existants les plus connus. Dans la deuxième partie du mémoire, nous nous intéressons aux mixtures de Dirichlet et comment on peut les exploiter pour la classification, en particulier le résumé des bases de données d'images. Contrairement aux approches classiques qui considèrent la loi normale comme densité, nous utilisons une généralisation de la Dirichlet pour l'adapter plus aux problèmes réels. Notre approche est traduite par un modèle mathématique basé sur le maximum de vraisemblance et la méthode de Fisher. Une interprétation très intéressante de notre méthode, basée sur la statistique géométrique, est donnée. Finalement, nous présentons des évaluations contextuelles et non-contextuelles, qui prouvent la validité de notre méthode

    On the smoothing of multinomial estimates using Liouville mixture models and applications

    Get PDF
    There has been major progress in recent years in statistical model-based pattern recognition, data mining and knowledge discovery. In particular, generative models are widely used and are very reliable in terms of overall performance. Success of these models hinges on their ability to construct a representation which captures the underlying statistical distribution of data. In this article, we focus on count data modeling. Indeed, this kind of data is naturally generated in many contexts and in different application domains. Usually, models based on the multinomial assumption are used in this case that may have several shortcomings, especially in the case of high-dimensional sparse data. We propose then a principled approach to smooth multinomials using a mixture of Beta-Liouville distributions which is learned to reflect and model prior beliefs about multinomial parameters, via both theoretical interpretations and experimental validations, we argue that the proposed smoothing model is general and flexible enough to allow accurate representation of count data

    Bayesian learning of inverted Dirichlet mixtures for SVM kernels generation

    Get PDF
    We describe approaches for positive data modeling and classification using both finite inverted Dirichlet mixture models and support vector machines (SVMs). Inverted Dirichlet mixture models are used to tackle an outstanding challenge in SVMs namely the generation of accurate kernels. The kernels generation approaches, grounded on ideas from information theory that we consider, allow the incorporation of data structure and its structural constraints. Inverted Dirichlet mixture models are learned within a principled Bayesian framework using both Gibbs sampler and Metropolis-Hastings for parameter estimation and Bayes factor for model selection (i.e., determining the number of mixture’s components). Our Bayesian learning approach uses priors, which we derive by showing that the inverted Dirichlet distribution belongs to the family of exponential distributions, over the model parameters, and then combines these priors with information from the data to build posterior distributions. We illustrate the merits and the effectiveness of the proposed method with two real-world challenging applications namely object detection and visual scenes analysis and classification

    Data-free metrics for Dirichlet and generalized Dirichlet mixture-based HMMs - A practical study.

    Get PDF
    Approaches to design metrics between hidden Markov models (HMM) can be divided into two classes: data-based and parameter-based. The latter has the clear advantage of being deterministic and faster but only a very few similarity measures that can be applied to mixture-based HMMs have been proposed so far. Most of these metrics apply to the discrete or Gaussian HMMs and no comparative study have been led to the best of our knowledge. With the recent development of HMMs based on the Dirichlet and generalized Dirichlet distributions for proportional data modeling, we propose to design three new parametric similarity measures between these HMMs. Extensive experiments on synthetic data show the reliability of these new measures where the existing ones fail at giving expected results when some parameters vary. Illustration on real data show the clustering capability of these measures and their potential applications

    Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection

    Get PDF
    This paper introduces a novel enhancement for unsupervised feature selection based on generalized Dirichlet (GD) mixture models. Our proposal is based on the extension of the finite mixture model previously developed in [1] to the infinite case, via the consideration of Dirichlet process mixtures, which can be viewed actually as a purely nonparametric model since the number of mixture components can increase as data are introduced. The infinite assumption is used to avoid problems related to model selection (i.e. determination of the number of clusters) and allows simultaneous separation of data in to similar clusters and selection of relevant features. Our resulting model is learned within a principled variational Bayesian framework that we have developed. The experimental results reported for both synthetic data and real-world challenging applications involving image categorization, automatic semantic annotation and retrieval show the ability of our approach to provide accurate models by distinguishing between relevant and irrelevant features without over- or under-fitting the data

    MML-Based Approach for Finite Dirichlet Mixture Estimation and Selection

    Get PDF
    Abstract. This paper proposes an unsupervised algorithm for learning a finite Dirichlet mixture model. An important part of the unsupervised learning problem is determining the number of clusters which best describe the data. We consider here the application of the Minimum Message length (MML) principle to determine the number of clusters. The Model is compared with results obtained by other selection criteria (AIC, MDL, MMDL, PC and a Bayesian method). The proposed method is validated by synthetic data and summarization of texture image database

    A variational Bayes model for count data learning and classification

    Get PDF
    Several machine learning and knowledge discovery approaches have been proposed for count data modeling and classification. In particular, latent Dirichlet allocation (LDA) (Blei et al., 2003a) has received a lot of attention and has been shown to be extremely useful in several applications. Although the LDA is generally accepted to be one of the most powerful generative models, it is based on the Dirichlet assumption which has some drawbacks as we shall see in this paper. Thus, our goal is to enhance the LDA by considering the generalized Dirichlet distribution as a prior. The resulting generative model is named latent generalized Dirichlet allocation (LGDA) to maintain consistency with the original model. The LGDA is learned using variational Bayes which provides computationally tractable posterior distributions over the model׳s hidden variables and its parameters. To evaluate the practicality and merits of our approach, we consider two challenging applications namely text classification and visual scene categorization

    Variational-Based Latent Generalized Dirichlet Allocation Model in the Collapsed Space and Applications

    Get PDF
    In topic modeling framework, many Dirichlet-based models performances have been hindered by the limitations of the conjugate prior. It led to models with more flexible priors, such as the generalized Dirichlet distribution, that tend to capture semantic relationships between topics (topic correlation). Now these extensions also suffer from incomplete generative processes that complicate performances in traditional inferences such as VB (Variational Bayes) and CGS (Collaspsed Gibbs Sampling). As a result, the new approach, the CVB-LGDA (Collapsed Variational Bayesian inference for the Latent Generalized Dirichlet Allocation) presents a scheme that integrates a complete generative process to a robust inference technique for topic correlation and codebook analysis. Its performance in image classification, facial expression recognition, 3D objects categorization, and action recognition in videos shows its merits

    BIDIRECTIONAL LSTM AND KALMAN FILTER FOR PASSENGER FLOW PREDICTION ON BUS TRANSPORTATION SYSTEMS

    Get PDF
    Forecasting travel demand is a complex problem facing public transit operators. Passenger flow prediction is useful not only for operators, used for long-term planning and scheduling, but also for transit users. The time is quickly approaching that short-term passenger flow prediction will be expected as a matter of course by transit users. To address this expectation, a Bi-directional Long Short-Term Memory Neural Network model (BDLSTM NN) and a Bi-directional Long Short-Term Memory Neural Network Kalman Filter model (BDLSTM KF) predict short-term passenger flow and based on the dependencies between passenger count and spatial-temporal features. A comprehensive preprocessing framework is proposed leveraging historical data and extracting bidirectional features of passenger flow. The proposed model is based on [1] but adapted, applied, and analysed to produce optimal results for passenger flow forecasting on a bus route. Building on [2], a BDLSTM architecture is then combined with a Kalman filter. The Kalman filter reduces the training and testing complexity required for passenger flow forecasting. The BDLSTM-based Kalman filter produces predictions with less uncertainty than each method alone. Evaluating the BDLSTM-based Kalman filter with two months of real-world data, one year apart shows positive improvements for short-term forecasting in high complexity bus networks. It is possible to see that the BDLSTM outperforms traditional machine and deep learning techniques used in this context
    • …
    corecore